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Abstract 

In earlier work pQ, we gave an oracle separating the relational versions of BQP and the 
polynomial hierarchy, and showed that an oracle separating the decision versions would follow 
from what we called the Generalized Linial-Nisan (GLN) Conjecture: that "almost fc-wise in- 
dependent" distributions are indistinguishable from the uniform distribution by constant-depth 
circuits. The original Linial-Nisan Conjecture was recently proved by Braverman [7]; we offered 
a $200 prize for the generalized version. In this paper, we save ourselves $200 by showing that 
the GLN Conjecture is false, at least for circuits of depth 3 and higher. 

As a byproduct, our counterexample also implies that <£ P NP relative to a random oracle 
with probability 1. It has been conjectured since the 1980s that PH is infinite relative to a 
random oracle, but the highest levels of PH previously proved separate were NP and coNP. 

Finally, our counterexample implies that the famous results of Linial, Mansour, and Nisan 
[T2"] . on the structure of AC functions, cannot be improved in several interesting respects. 

1 Introduction 

Proving an oracle separation between BQP and PH is one of the central open problems of quantum 
complexity theory. In a recent paper [T], we reported the following progress on the problem: 

(1) We constructed an oracle relative to which FBQP <f_ FBPP PH , where FBQP and FBPP PH are 
the "relational" versions of BQP and PH respectively (that is, the versions where there are 
many valid outputs, and an algorithm's task is to output any one of them). 

(2) We proposed a natural decision problem, called Fourier Checking, which is provably in 
BQP (as an oracle problem) and which we conjectured was not in PH. 

(3) We showed that Fourier Checking has a property called almost k-wise independence, and 
that no BPPp at h or SZK problem shares that property. This allowed us to give oracles relative 
to which BQP was outside those classes, and to reprove all the known oracle separations 
between BQP and classical complexity classes in a unified way. 

(4) We conjectured that no PH problem has the almost A;- wise independence property, and called 
that the Generalized Linial-Nisan ( GLN) Conjecture. Proving the GLN Conjecture would 
imply the existence of an oracle relative to which BQP <f_ PH. 
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This paper does nothing to modify points (l)-(3) above: the unconditional results in [T] are still 
true, and we still conjecture not only that there exists an oracle relative to which BQP <f_ PH, but 
that Fourier Checking is such an oracle. 

However, we will show that the hope of proving Fourier Checking^ PH by proving the GLN 
Conjecture was unfounded: 

The GLN Conjecture is false, at least for and higher levels of the polynomial hier- 
archy. 

We prove this by giving an explicit counterexample: a family of depth-three AC circuits that 
distinguish the uniform distribution over n-bit strings from an O (/c/n)-almost A;- wise independent 
distribution, with constant biasQ 

Our counterexample was inspired by a recent result of Beame and Machmouchi [3], giving a 
Boolean function with quantum query complexity £1 (n/logn) that is computable by a depth-three 
AC circuit. This disproved a conjecture, relayed to us earlier by Beame, stating that every AC 
function has quantum query complexity n 1- ^ 1 ). Like the Beame-Machmouchi counterexample, 
ours involves inputs X = x\ . . . £ [M] N that are lists of positive integers, with the Xj's encoded 
in binary to obtain a Boolean problem; as well as a function / : [M] 1V -> {0, 1} that uses two 
alternating quantifiers to express a "global" property of X. In Beame and Machmouchi's case, the 
property in question was that the function x (i) := Xi is 2-to-l; in our case, the property is that 
x (i) is surjectiveJl 

Our counterexample makes essential use of depth- three circuits, and we find it plausible that 
the GLN Conjecture still holds for depth-too circuits (i.e., for DNF formulas)!! As shown in 
PQ, proving the GLN Conjecture for depth-two circuits would yield an oracle relative to which 
BQP <f_ AM, which is already a longstanding open problem. 

Given that the GLN Conjecture resisted attacks for two years (and indirectly motivated the 
beautiful works of Razborov [T6] and Braverman [7] on the original LN Conjecture), our coun- 
terexample cannot have been quite as obvious as it seems in retrospect! Perhaps Andy Drucker 
(personal communication) summarized the situation best: almost /c-wise independent distributions 
seem to be much better at fooling people than at fooling circuits. 

1.1 Further Implications 

Besides falsifying the GLN Conjecture, our counterexample has several other interesting implica- 
tions for PH and AC . 

Firstly, we are able to use our counterexample to prove that {U P 2 ) A £ P NP with probability 1 
relative to a random oracle A. Indeed, we conjecture that our counterexample can even be used 
to prove (rig) <f. (£<j) with probability 1 for a random oracle A. The seminal work of Yao 
|18j showed PH infinite relative to some oracle, but it has been an open problem for almost thirty 
years to prove PH infinite relative to a random oracle (see the book of Hastad [17] for discussion). 
Motivation for this problem comes from a surprising result of Book [6], which says that if PH 

1 Note that depth-three AC circuits correspond to the second level of PH, depth-four circuits correspond to the 
third level, and so on. 

2 Beame and Machmouchi [3] also mention the surjectivity property, in Corollary 6 of their paper. 
3 Indeed, we originally formulated the conjecture for depth-two circuits only, before (rashly) extending it to arbi- 
trary depths. 
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collapses relative to a random oracle, then it also collapses in the unrelativized world. Our result, 
while simple, appears to represent the first "progress" toward separating PH by random oracles 
since the original result of Bennett and Gill J5] that P 7^ NP 7^ coNP relative to a random oracle 
with probability 1@ 

Secondly, our counterexample shows that the celebrated results of Linial, Mansour, and Nisan 
|12j . on the Fourier spectrum of AC functions, cannot be improved in several important respects. 
In particular, Linial et al. showed that every Boolean function / : {0, 1}™ — > {0, 1} in AC has 
average sensitivity O (polylog (n)). However, we observe that this result fails completely if we 
consider a closely-related measure, the average block-sensitivity. Indeed, there exists a reasonably- 
balanced Boolean function / 6 AC such that every 1-input can be modified in Q (n/logn) disjoint 
ways to produce a 0-input, and almost every 0-input can be modified in $7 (n/logn) disjoint ways 
to produce a 1-input. What makes this behavior interesting is that one normally associates it with 
(say) Parity, the canonical function not in AC ! 

Linial et al. |12j also showed that every Boolean function / £ AC has a low-degree approximating 
polynomial: that is, a real polynomial p : {0, l} n — > M, of degree O (polylog (n)), such that 



E 

X£{o,iy 



(p(X)-f(X)) 2 =o(l) 



However, using our counterexample, we will show that such a polynomial p cannot generally be 
written as a linear combination of terms, p = acC, where the coefficients satisfy the following 
bound: 

\a c \ 2" |c| =n°W. 

c 

In other words, such a polynomial cannot be "low-fat" in the sense defined by Aaronson PQ, but 
must instead involve "massive cancellations" between positive and negative terms. This gives 
the first example of a Boolean function / that can be approximated in L2-norm by a low-degree 
polynomial, but not by a low-degree low-fat polynomial — thereby answering another one of the 
open questions from pp. 



1.2 The Future of BQP and PH 

While this paper rules out the GLN approach, at least three plausible avenues remain for proving 
an oracle separation between BQP and PH. 

(1) Our original idea for proving Fourier Checking^ PH was to use a direct random restriction 
argument — and while we were unable to make such an argument work, we have also found 
nothing to rule it out. 

(2) Besides almost /c-wise independence, the other "obvious" property of Fourier Checking 
that might be useful for lower bounds is its close connection with the Majority function. 
Indeed, given as input the truth table of a Boolean function / : {0, l} n — > {—1, 1}, estimating a 
single Fourier coefficient / (s) := (— l) x s f (x) is easily seen to be equivalent to solving 

4 Though "working from the opposite direction," Cai 9 proved the beautiful result that PH 7^ PSPACE relative 
to a random oracle with probability 1. Note that any relativized world where PH is infinite must also satisfy 
PH 7^ PSPACE. Cai [8] also proved that BH is infinite with probability 1, where BH represents the Boolean hierarchy 
over NP, a subclass of Pn P . 
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Majority, which is known to be hard for AC . Thus, in proving Fourier Checking^ PH, 
the difficulty is "merely" to show that checking the answers to 2" overlapping Majority 
instances is not significantly easier for an AC circuit than checking the answer to one instance. 
While the usual hybrid argument fails in this case, one could hope for some other reduction — 
possibly a non-black-box reduction — showing that if Fourier Checking is in AC , then 
Majority is as well. 

(3) Recently, Fefferman and Umans [10] proposed a beautiful alternative approach to the rel- 
ativized BQP versus PH question. Like approach (2) above, their approach is based on a 
hoped-for reduction from Majority. However, they replace Fourier Checking by a dif- 
ferent candidate problem, which involves Nisan-Wigderson combinatorial designs [15] rather 
than the Fourier transform. They show that their candidate problem is in BQP, and also 
show that it is not in PH, assuming (roughly speaking) that the analysis of the NW generator 
can be improved in a direction that people have wanted to improve it in for independent 
reasons. Fefferman and Umans' conjecture follows from the GLN Conjecture!! but is much 
more tailored to a specific pseudorandom generator, and is completely unaffected by our 
counterexample . 

1.3 Organization 

The rest of the paper is organized as follows. Section [2] provides background on AC , (almost) 
A:-wise independence, and the (Generalized) Linial-Nisan Conjecture; then Section [3] presents our 
counterexample. Section U] uses the counterexample to prove that U p 2 <£ P NP relative to a ran- 
dom oracle, and Section [5] gives implications of the counterexample for the noise sensitivity and 
approximate degree of AC functions. Section [6] concludes with some discussion and open problems. 

2 Background 

We refer the reader to pQ for details on the original and generalized Linial-Nisan Conjectures, as 
well as their relationship to BQP and PH. In this section, we give a brief recap of the definitions, 
conjectures, and results that are relevant to our counterexample. 

By AC , we mean the class of Boolean function families {fn} n >\ such that each f n : {0, l} n — > 
{0, 1} is computable by a circuit of AND, OR, and NOT gates with constant depth, unbounded 
fanin, and size n°^\ Here depth means the number of alternating layers of AND and OR gates; 
NOT gates are not counted. Abusing notation, we will often use phrases like "AC circuit of size 
2"° (1) ," which means the size is now superpolynomial but the depth is still 0(1). We will also 
generally drop the subscript of n. 

Throughout the paper we abbreviate probability expressions such as Prx~© [/ (X)] by Pr© [/] . 
Let hi be the uniform distribution over n-bit strings, so that Pr^ [X] = l/2 n for all X £ {0, l} n . 
A distribution D over {0, l} n is called k-wise independent (for k < n) if D is uniform on every 
subset of at most k bits. A central question in pseudorandomness and cryptography is what 
computational resources are needed to distinguish such a "pretend-uniform" distribution from the 
"truly-uniform" one. In 1990, Linial and Nisan [13J famously conjectured that n £ -wise independence 
fools AC circuits: 

As, indeed, anything follows from the GLN Conjecture. 
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Conjecture 1 (Linial-Nisan or LN Conjecture) Let T> be any n^ 1 ) -wise independent distri- 
bution over {0,1}™, and let f : {0, l} n — > {0,1} be computed by an AC circuit of size 2 n ° (1) . 
Then 

o(l). 



Pr[/] 



Pr [/] 

u 11 



(The actual parameters in the LN Conjecture are considerably stronger than the above, but 
also more complicated to state. We chose weaker parameters that suffice for our discussion.) 

After seventeen years of almost no progress, Bazzi [2 J finally proved Conjecture [1] for the special 
case of depth-2 circuits. Shortly afterward, Razborov |16j gave a dramatically simpler proof of 
Bazzi's theorem, and shortly after that, Braverman [7J proved the full Conjecture [TJ 



Theorem 2 (Braverman's Theorem |7J) Let f : {0, l} n — > {0, 1} be computed by an AC 



cir- 



cuit of size S and depth d, and let T> be a (log — ) 
Then for all sufficiently large S, 



7d- 



■wise independent distribution over {0, l} n . 



Pr[/] 



Pr[/] 



< e. 



Even before the work of Razborov [16! and Braverman [7], we had proposed a deceptively 
modest-seeming generalization of Conjecture [U motivated by the application to the BQP versus 
PH question mentioned previously. To state the generalization, we need some more terminology. 
Let X = x\ . . . x n 6 {0, l} n be a string. Then a literal is an expression of the form Xi or 1 — X{, 
and a k-term is a product of k literals (each involving a different Xj), which is 1 if the literals all 
take on prescribed values and otherwise. 

Definition 3 (almost A;- wise independence) Given a distribution T> over {0, l} n and a k-term 
C , we say that C is e-fooled by T> if 

— Si — 

(Note that Pr^ [C] is just 2~ h .) Then V is e-almost k-wise independent if every k-term C is 
e-fooled by T>. 



In other words, there should be no assignment to any k bits, such that conditioning on that 
assignment gives us much information about whether X was drawn from T> or from U. We can 
now state the conjecture that we falsify. 

Conjecture 4 (Generalized Linial-Nisan or GLN Conjecture) Let V be a l/n n<yl ^ -almost 
n^ 1 ) -wise independent distribution over {0, l} n , and let f : {0, l} n — > {0,1} be computed by an 
AC circuit of size 2 n ° 1 . Then 



£[/] 



P/Lf] 

LA 



o(l). 



Note that, for Conjecture H] not to be ruled out immediately, it is essential that our definition 
of e- fooling was multiplicative rather than additive. For suppose we had merely required that, on 
every subset of indices S C [n] with IS"! < k, the marginal distribution T>s was e-close in variation 
distance to the uniform distribution. Then it would be easy to construct almost /c-wise distributions 
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T> that were distinguishable from the uniform distribution even by DNF formulas. For example, 
the uniform distribution over all sequences X = x\ . . .xjy G [N] that are permutations (with the 
Xj's appropriately coded in binary) is one such T>. 

This paper shows that, even with the more careful multiplicative definition of e- fooling, there 
is still a counterexample to Conjectured] — although we have to work harder and use higher-depth 
circuits to construct it. The failure of Conjecture[4]means that Braverman's Theorem is "essentially 
optimal," in the sense that one cannot relax the /c-wise independence condition to almost £;-wise 
independence. This demonstrates a striking contrast between fe-wise independence and almost 
fe-wise independence in terms of their implications for pseudorandomness. 

3 The Counterexample 

Fix a positive integer m, and let M := 2 m . Then it will be useful think of the input X = x± . . . xjy 
as belonging to the set [M]^, where N := [Mm In 2]. However, to make contact with the original 
statement of the GLN Conjecture, we can easily encode such an X as an n-bit string where n := Nm, 
by writing out each X{ in binary. Abusing notation, we will speak interchangeably about X as an 
element of {0, l} n or of [M]^. 

Let the image of X, or Imx '■= {x\, . . . ,xn}, be the set of integers that appear in X. Then 
define the surjectivity function, /surj : {0, 1}™ — > {0, 1} by /s ur j (X) = 1 if Imx = [M] and 
/surj {X) = otherwise. A first easy observation is that /s ur j G AC . 

Lemma 5 /surj is computable by an AC circuit of depth 3 and size O (NMm) . 

Proof. For all i £ [N] and y £ [M], let A (xi,y) denote the m-term that evaluates to 1 if X{ = y 
and to otherwise. Then 

/ Surj p0= /\ \/ A( Xi ,y). 

ye[M]ie[N] 

m 

Now let U be the uniform distribution over [M] N , so that Pr u [X] = 1/M N for all X G [M] N . 
Also, given an input X G [M] , we define a distribution T> (X) over "perturbed" versions of X via 
the following procedure: 

(1) Choose y uniformly at random from [M]. 

(2) For each i G [N] such that x\ = y, change x\ to a uniform, independent sample from [M] \ {y}. 

Then we let T> := T> (U) be the distribution over inputs Z obtained by first drawing an X from 
U, and then sampling Z from D (X). Notice that Im^ ^ [M] and hence / (Z) = for all Z in the 
support of V. 

Here is an observation that will be helpful later. Given a sample Z = Z\ . . . zn from D, we can 
define a distribution T> mv (Z) over perturbed versions of Z via the following "inverse" procedure: 

(1) Choose y uniformly at random from [M]\Im^- 

(2) For each i G [N], change z% to y with independent probability 1/M. 
We claim that 2? mv is indeed the inverse of T>. 
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Claim 6 V' mv (V (U)) = U. 

Proof. Let V y (X) be the variant of V (X) where we fix the element y G [M\ in step (1), so that 
V (X) = E y€[M] V y (X). Similarly, let V y nv (Z) be the variant of V inv (Z) where we fix the element 
y G [M] \Imz- Then it is easy to see that, for every fixed y G [M], we have V™ (V y (U)) =U. 
For choosing each X{ uniformly at random, then changing it randomly if equals y, then changing 
it back to y with probability 1/M, is just a more complicated way of choosing xi uniformly at 
random. 

Now let Hist {X) be the histogram of X: that is, the multiset {hi, . . . ,I%m} where h y := 
\{i : Xi = y}\. Then we can conclude from the above that, for every y G [M], 

Hist (V inv (V (W))) = Hist (V inv (V y (U))) 
= Hist (V™ {V y (U))) 
= Hist (U) . 

Call a distribution A over [M] N symmetric if Pr_4 [X] depends only on Hist (X). Notice that hi is 
symmetric, and that if A is symmetric, then V (A) and V mv (A) are both symmetric also. This 
means that from Hist (V' mv (V (U))) = Hist (U), we can conclude that V inv (V (U)) =U as well. ■ 
We now show that the function /surj distinguishes V from U with constant bias. 

Lemma 7 E u [/s U rj] - [/ Sur j] > 1/e - o (1) . 

Proof. By construction, we have Ex> [/surj] = 0. On the other hand, 

E[/surj]=Pr[|Im x |=M]. 

Think of N = MlnM + O (1) balls, which are thrown uniformly and independently into M bins. 
Then |Imx| is just the number of bins that receive at least one ball. Using the Poisson approxi- 
mation, we have 

lim Pr [|Im x | = Ml = -, 

M->oo u e 

and therefore [fsurj] > 1/e — o (1). ■ 

To show that the distribution V is almost /c-wise independent, we first need a technical claim, 
to the effect that almost /c-wise independence behaves well with respect to restrictions. Given a 
fc-term C, let V (C) be the set of variables that occur in C. Also, given a set S of variables that 
contains V (C), let (C) be the set of all 2\ s \~ k terms 5 such that V (B) = S and B=>C. 

Claim 8 Given a k-term C and a set S containing V(C), suppose every term B G Us (C) is 
e- fooled by V. Then C is e- fooled by V. 

Proof. It suffices to check the claim in the case \S\ = k + 1, since we can then use induction on k. 
Let S = V (C) U {x} for some variable x ^ V (C). Then Us (C) contains two terms: Co := C A x 
and Ci := C A x. By the law of total probability, we have Prp [C] = Ptt> [Co] + Prp [C\] and 
Pt u [C] = Pr w [C ] + Pr w [Ci]. Hence 

. f Prp[Co] Pr p [d] ) Pr p [C] [ Pr p [C ] Pr p [d] 

mm 1 Pr w [C ] ' Pr w [d] / " Pr u [C] ~ ^ \ Pr u [C ] ' Pi u [d] 
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So if Cq and C\ are both e-fooled by T>, then C is e-fooled as well. ■ 

Given an input X = x\ . . . x^, recall that A (x,, y) denotes a term that evaluates to 1 if Xj = y, 

and to if Xj / y. Then let a proper k-term C be a product of the form A (x^ , y{) A (xi k , yk), 

where 1 < i\ < ■ ■ ■ < < N and yi, . . . , y^ £ [M]. 

We now prove the central fact, that T> is almost fc-wise independent. 

Lemma 9 T> is 2k/M-almost k-wise independent for all k < M/2. 

Proof. Notice that a Boolean fc-term can involve bits from at most k different Xj's. So by Claim 
El to show that any Boolean k-term is e-fooled by T>, it suffices to show that any proper fc-term 

C = A (xi i; yi) A(x ik ,y k ) 

is e- fooled by V. 

We first upper-bound Prx> [C]. Recall that to sample an input Z from the distribution T>, we 
first sample an X from U, and then sample Z from D (X). Suppose C (X) = 1. Then the only 
way we can get C (Z) = is if, when we perturb the input X to obtain V(X), some A (xi j: yj) 
changes from TRUE to FALSE. But for each j £ [k], this can happen only if y = y~, which occurs 
with probability 1/M. So by the union bound, 

Pr [C] > Pr [C] • ( 1 - 

We can similarly upper-bound Pr^ [C]. By Claim [6l to sample an input X from U, we can first 
sample a Z from T>, and then sample X from £> mv [Z~). Suppose C (Z) = 1. Then we can only 
get C (X) = if, when we perturb Z to 2? mv (Z), some A (zij,yj) changes from TRUE to FALSE. 
But each Z{ changes with probability at most 1/M. So by the union bound, 

Pr [C] > Pr [C] ■ ( 1 - 4V 

hi T> V M J 

Combining the upper and lower bounds, and using the fact that k < M/2, we have 

k < Pr P [C] < 2fc 
M ~ Pr w [C] ~ M' 

■ 

Combining Lemmas 0E1 and[9l and recalling that n = A^m, we obtain the following. 

Theorem 10 Conjecture^ (the GLN Conjecture) is false. Indeed, there exists a family of Boolean 
functions /s U rj : {0,1}" -4 {0,1}, computable by AC circuits of size O (n 2 ) , ciept/t 3, and froi- 
tom fanin O(logn), as we// as an O ((£;log 2 n) /n) -almost k-wise independent distribution T> over 
{0, l} n , Such that E V [fsurj] - E U [/ Su rj] =0(1). 

4 Random Oracle Separations 

In this section, we reuse the function /s ur j and distribution V from Section[3]to show that (n^)" 4 
P NP with probability 1 relative to a random oracle A. The central observation here is simply that 
T> has support on a constant fraction of [M] — and that therefore, any algorithm that computes 
/surj (X) on a 1 — e fraction of inputs X E [M] N must also distinguish T> from U with constant 
bias. The following lemma makes this implication precise. 
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Lemma 11 Let B be a random variable such that Pr^ [B = /surj] — 0.92. Then Prjy [B] — 
Pi v [B] > 0.022 - o(l). 

Proof. For convenience, let us adopt the convention that all probabilities are implicitly the limiting 
probabilities as m — > oo; this introduces at most an o (1) additive error. Then Pr^ [/surj] = V e ! s ° 

Pr [B] > Pr [/ Surj ] —Pt[B^= / Surj ] > - - 0.08 > 0.287. 

It remains to upper-bound Prx> [B]. Using the Poisson approximation, for every fixed integer k > 
we have 

Pr[\Im x \=M-k] = ^- 
u e ■ k\ 

By comparison, for every fixed k > 1 we have 

Pr [|Imx| = M — k] ' 



v Ll A| J e- (Jfe-1)!' 

Now, once we condition on the value of |Imx|, it is not hard to see that the distributions T> and U 
are identical. Thus, since 

Prx»[|Imx-| =M-k] _ e ■ k\ 
Pr w [|Im x | =M-k] ~ e ■ (k - 1)! ~ 

increases with k, the way to maximize Prx> [B] is to set B = 1 for those inputs X such that k is as 
large as possible (in other words, such that |Imx| is as small as possible). Notice that 

Pr[(|Im x |<M)AB]<Pr[ J B^/ Surj ] 

LA LA 

< 0.08 



It follows that 



Combining, 



oo ^ 
k=3 



Pr[B]<J2^i\^x\=M-k] 

fc=3 

oo 1 

2 

= 1 - - 

e 

< 0.265. 



Pr LSI - Pr fBl > 0.287 - 0.265 = 0.022. 

U v 



Recall that Lemma [9] showed the distribution T> to be 2/c/M-almost fc-wise independent. Ex- 
amining the proof of Lemma [9l we can actually strengthen the conclusion to the following. 
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Lemma 12 Let F be a k-DNF formula, with k < M/2. Then 

k Pt v \F] 2k 

1 < — H~4 < H ■ 

M ~ Pr w [F] ~ M 

Proof. Let F = C\ V ■ • • V C t . Fix an input X G [M] N , and suppose F (X) = 1. Then there 
must be an % G [£] such that C% (X) = 1. In the proof of Lemma [9J we actually showed that 



It follows that 



and hence 



Similarly, 



Pr [Ci]>l 
V(X) 1 



Pr \F] > 1 

V{X) 



Pr \F] > Pr IF] 

v u 



Pr \F] > Pr IF] 
u v 



k_ 
M' 

k_ 

M' 



M 



M 



The lemma now follows, using the assumption k < M/2. ■ 

By combining Lemma H2] with the standard diagonalization tricks of Bennett and Gill [5] , we 
can now prove a random oracle separation between and P NP . 

Theorem 13 (nl)" 4 P NpA with probability 1 relative to a random oracle A. 

Proof. We will treat the random oracle A as encoding, for each positive integer m, a random 
sequence of integers X m G [M] N , where M := 2 m and N := [Mm In 2] . Let / Su rj : [M] N {0, 1} 
be our usual surjectivity function; i.e. /s ur j (X m ) = 1 if and only if Imx m = [M]. Then let L be 
a unary language that contains m if and only if /s ur j (X m ) = 1. Clearly L G (n^) ■ It remains 
to show that L ^ P NP with probability 1 over A. Fix a P NP machine B A , which runs in time 
p{m) for some fixed polynomial p. Also, let mi,m2, ... be a sequence of input lengths that are 
exponentially far apart, so that we do not need to worry about B A (0 mi ) querying X mj for any 
j > i. We will treat X m . as fixed for all j < i, so that only X := X m := X mi itself is a random 
var iable. Then B A (0 m ) makes a sequence of at most p (to) adaptive NTIME (p (to)) queries to X, 
call them Qi, . . . , Q p (m)- For each t G [p (to)], we can write a p (m)-DNF formula (X) which 
evaluates to TRUE if and only if Qt (X) accepts. Then by Lemma [T2l we have 



This implies that 



!_PH < Prp [Ft] <1+ M 
M ~ Pr u [F t ] ~ M 



TO 



So by the union bound, we have 



Pr \F t ] - Pr \F t ] 



< 



2p (to) 
M 



Pr [B A (0 m )] - Pr [B A (0 m )] 



< 



2p (to) 
M 
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even after we take into account the possible adaptivity of the queries. Clearly 2p (m) 2 /M < 0.022 
for all sufficiently large m. So taking the contrapositive of Lemma [TT] 

Pr [B A (0 m ) = / (X)] < 0.92 

for all sufficiently large M. So as in the standard random oracle argument of Bennett and Gill [5], 
we have 

oo 

Pr [B A decides L] < J]>r [B A (0 m <) = / (X m .)) = 0. 
A »=1 A 

Then taking the union bound over all P NpA machines B A , 



Pr 

A 



L £ P p 







as well. ■ 

It is well-known that P NP = BPP NP with probability 1 relative to a random oracle A. Thus, 
Theorem [13] immediately implies that (n^) <f- BPP NP relative to a random oracle A as well. 
Since the class BPP pat h is contained in BPP NP (as shown by Han, Hemaspaandra, and Thierauf 
[TT]). we also obtain the new result that (llf) <£. BPP^ ath relative to a random oracle A. 

5 Implications for AC 

In this section, we discuss two implications of our counterexample for AC functions. 

(1) Linial, Mansour, and Nisan [12] famously showed that every AC function has average sensitiv- 
ity O (polylog n). By contrast, we show in Section [5J] that there are reasonably-balanced AC 
functions with average block- sensitivity almost linear in n (on both 0-inputs and 1-inputs). In 
other words, there exist AC functions that counterintuitively behave almost like the Parity 
function in terms of block-sensitivity! 

(2) Linial et al. [12j also showed that every AC function can be approximated in L2-norm by a 
low-degree polynomial. By contrast, we show in Section 15.21 that there does not generally 
exist such a polynomial that also satisfies a reasonable sparseness condition on the coefficients 
(what Aaronson [T] called the "low- fat" condition). 

5.1 The Average Block-Sensitivity of AC 

Let us first recall the definition of average sensitivity. 

Definition 14 (average sensitivity) Given a string X € {0, l} n and coordinate i E [n], let X 1 
denote X with the i th bit flipped. Then given a Boolean function f : {0, l} n — > {0, 1}, the sensitivity 
of f at X, or sx (/), is the number of i 's such that f {X 1 } ^ f (X). Then the average sensitivity 
of f is 

5(/): =«,V [s * (/)1 - 

Assuming f is non-constant, we can also define the average 0-sensitivity so (/) and average 1- 
sensitivity s\ (/) respectively, by 

s b (/) := E [s x (/)] • 

y > XG{0,1}" : f(X)=b 1 W ' S 
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Then Linial, Mansour, and Nisan [12] showed that every AC function has low average sensi- 
tivity: 

Theorem 15 (|12j) Every Boolean function f : {0, l} n — > {0,1} computed by an AC circuit of 
depth d satisfies s (/) = O (log d n) . 

We now recall the definition of block- sensitivity, a natural generalization of sensitivity introduced 
by Nisan [Hj. 

Definition 16 (average block-sensitivity) Given a string X G {0, l} n and a subset of indices 
B C [n] (called a "block"), let X B denote X with the bits in B flipped. Then given a Boolean 
function f : {0, l} n — > {0, 1}, the block- sensitivity of f at X, or bsx (/), is the largest k for which 
there exist k pairwise- disjoint blocks, B\, ... , B^, such that f (X Bi ) / / (X) for all i G [k]. Then 
the average block- sensitivity of f is 

S(/): =«V [te(/)1 - 

Assuming f is non-constant, we can also define the average 0-block-sensitivity bso (/) and average 
1-block- sensitivity bsi (/) respectively, by 

te fe (/) := E [bsx (/)] • 

X6{0,1}" : f(X)=b 

We consider the following question: does any analogue of Theorem [75| still hold if we replace 
sensitivity by block-sensitivity? 

We start with some simple observations. Call a Boolean function / : {0, l} n — > {0, 1} 
reasonably-balanced if there exist constants a,b £ (0, 1) such that a < Ej 0jl j™ [/] < b for ev- 
ery n. Then if we do not require / to be reasonably-balanced, it is easy to find an / G AC 
such that bso (/) and bsi (/) are both large. For example, the two-level AND-OR tree satisfies 
bio (/) = 6 (VH) and bs" x (/) = G (^n). 

So let us require / to be reasonably-balanced. Even then, it is easy to find an / G AC such 
that bs (/) = f2 (n/logn). Given an input X = x\ . . . xjy G [N] , define the Tribes function by 
/Tribes P0 = 1 if there exists an i G [N] such that Xi = 1, and /Tribes (X) = otherwise. Then not 
only is /Tribes in AC , it has an AC circuit of depth 2 (i.e., a DNF formula). On the other hand, 
let X be any 0-input of /Tribesi then we can change X to a 1-input by setting xi := 1 for any i. So 



bsx (/Tribes) > N = Q 
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logn 



where n := N log 2 N is the bit-length of X. Hence bso (/Tribes) = ^ [n/ log n). Indeed bs (/Tribes) = 
O, (n/logn) as well, since 

lim Pr [/Tribes (X) = 0] = -. 
N^-oo x e 

By contrast, one can check that bsi (/Tribes) is only (log re). Indeed, any Boolean function / 
that can be represented by a /c-DNF formula satisfies bsi (/) < k, since if a particular fc-term C is 
satisfied, then there are at most k disjoint ways to make it unsatisfied. 

The above observations might lead one to ask the following question: does every reasonably- 
balanced AC function f satisfy either bso (/) = O(polylogre) or bsi (/) = O(polylogn)? We 
now show, alas, that the answer is still no. 
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Theorem 17 There exists a reasonably-balanced Boolean function f : {0, l} n —> {0, 1} ; computable 
by a depth-three AC circuit, such that bso (/) = (n/logn) and bsi (/) = SI (n/logn). 

Proof. Let / be the function /surj from our counterexample. As usual, we can think of an input 
X to /s ur j as belonging to either or {0, l} n or [M] w , where M = 2 m , N = \Mm In 2] , and n = Nm. 
As in Lemma [71 we have 

lim E [/suri] = 

so /surj is reasonably-balanced. 

To lower-bound bsi (/surj), consider an input X = xi . . . xjy £ [M] such that /s ur j = 1 or 
equivalently Imx = [M]. Given y € [M], let Cy (AT) be the set of all i £ [N] such that Xi = y. 
Then we can change /surj (AT) from 1 to 0, by changing X{ to an arbitrary element of [M] \ {y} 
for each i £ C y (X). This implies that bsx (/surj) > Af. Indeed, we can improve the bound to 
bsx (/surj) > Mm, by noticing that it suffices to change a single bit of Xi for each i £ C y (X). 
Hence 

bii (/surj) > Mm = O (^-J . 

Next consider an input X = x\ . . . xn £ [M] such that |Imx| = M—l. Then clearly /surj (X) = 0. 
Let A (X) be the set of indices i £ [N] for which there exists at least one j ^ i such that Xi = Xj. 
Then we have |A(X)| > N — M by the pigeonhole principle. Also, for any i £ A(X), let X 1 
be identical to X, except that we change Xi to the unique element of [M] \ Imx- Then clearly 
lm x , = [M] and / Surj (X { ) = 1. Therefore bs x (/surj) > \A (X)\ > N — M. Furthermore, as in 
Lemma [TT| we have 

lim Pr [|Im x | = M - 1] = - 

M->-oo \M] N e 

by the Poisson approximation. It follows that 

l/e In 

lim bs (/surj) > -r^TT (N-M) = n< ' 



M->oo 1 — l/e \logra 



5.2 The Inapproximability of AC by Low- Fat Polynomials 

Let us recall another basic result of Linial, Mansour, and Nisan |12j . 



Theorem 18 (|12|) Let f : {0, l} n — > {0, 1} 6e computed by an AC circuit of depth d. Then for 
all e > 0, £/iere exists a multilinear polynomial p : {0, l} n — ?■ R of degree O (log d (n/s)) such that 



(p - fY 



< e. 



In this section, we ask whether one can extend Theorem[18]to get an approximating polynomial p 
that is not merely low-degree, but also representable using coefficients that are bounded in absolute 
value. The specific property that we want was called the "low- fat" property by Aaronson pQ: 

Definition 19 (low-fat polynomials) Given a multilinear polynomial p : {0, l} n — > R, define 
the fat content of p, or fat (p), to be the minimum of \otc\ 2~l c 'l over all representations p = 
Y2c a cC of p as a linear combination of terms (that is, products of 's and (1 — x^) 's). Then we 
call p low-fat if fat (p) = n°^ . 
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One motivation for Definition [19] comes from pQ, where it was pointed out that the Generalized 
Linial-Nisan Conjecture is equivalent (via linear programming duality) to the following conjecture: 



Conjecture 20 (Low-Fat Sandwich Conjecture) Let f : {0, 1}" — > {0, 1} be computed by an 
AC circuit of size 2 n ° 1 . Then there exist low-fat multilinear polynomials pg,p u '■ {0, 1}™ — > of 
degree n°^\ that "sandwich" f in the following sense: 

(i) p e (X) <f(X)< p u (X) for all X G {0, l} n and 
(ii) E u [p u - Pe] = o(l). 

Without the adjective "low-fat," Conjecture [20] would be equivalent to the original Linial-Nisan 
Conjecture, as shown by Bazzi [2]. And indeed, Braverman [7] heavily exploited this equivalence 
in his proof of the original LN Conjecture^ 

Of course, from the fact that the GLN Conjecture is false, we can immediately deduce that 
Conjecture [20] is false as well. 

On the other hand, the notion of low-fat polynomials seems interesting even apart from Con- 
jecture [20] — for the low-fat condition is a kind of "sparseness" condition, which might be useful (for 
example) in learning theory. Furthermore, the falsehood of Conjecture [20] does not directly rule 
out the possibility of low-fat approximating polynomials for every AC function, since Conjecture 
[201 talks only about sandwiching polynomials. However, with a bit more work, we now show the 
existence of an AC function that has no low-fat, low-degree approximating polynomial of any kind. 

Theorem 21 There exists a Boolean function f : {0, l} n — > {0,1}, computable by a depth-three 
AC circuit, for which any multilinear polynomial p : {0, l} n — > R that satisfies (p — f) 2 = o (1) 
also satisfies deg (p) fat (p) = Vtin/ log 2 n) . 

Proof. Once again we let / = /surj- Let p be a multilinear polynomial such that (p — fY 
e. By definition, we can write p as a linear combination of terms, p = ctcC, such that 
J2c \ a c \ E^ [C] = fat (p). Hence 



E\p]-E\p]=J2<xc (g[C]-E[C]) 



< 



c 

2 fat (p) deg (p) 
M ' 



Technically, Braverman constructed polynomials that satisfied slightly different properties than (i) and (ii) from 
Conecture 1201 However, we know from Bazzi's equivalence theorem [2] that it must be possible to satisfy those 
properties as well. 
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where the second line follows from Lemma [9j Also, let A := p — /surj- Then as in the proof of 
Lemma HU we have 



E [A 2 
u L 

M 

^Pr[|Im x | =M-k] • E [A 2 | \Im x \ = M - k) 



u u 

k=0 

M 



^^ E„[A 2 | \1m x \=M-k] 

^ 7lA o(1) ' 

fc=0 



whereas 



M 



k=0 

Combining, we find that 



E [A 2 ] = J] Pr [\lm x \ = M - k] • E [A 2 | |Im x | = M - k] 



E[A 2 ] = ofelog^ +o(l) 



Hence 



E [/Surj] - E [/Surj] = (g [p] - E [A]) - (e [p] - E [A]) 



< E [p] - E [p] + E [A] + E [A] 
\U V J U V 

2 fat (p) deg (p) 



< 2fat W degW +0 (flogij + o(l), 

where the third line follows from Cauchy-Schwarz. On the other hand, we know from Lemma 
that 

E[/su rj ]-E[/ Surj ]>-- (l). 

LA D 6 



So combining, if e = o(l), then 

V e J " Vlog 2 n, 

■ 

Since /s U rj nas a depth-three AC circuit, it follows from Theorem 1181 that there exists a polyno- 



fat (p) deg (p) = fi ( — ) = O 



mial p of degree O (log 3 n) such that Ey (p — /s 



o (1). Thus, one corollary of Theorem (22 



urjy 

is a separation between low-degree approximation and low-degree low-fat approximation. In other 
words, there exists a Boolean function / (namely /surj) that can be well-approximated in L2-norm 
by a polynomial of degree O (polylogn), but not by a low- fat polynomial of degree O (polylogn). 
This answers one of the open problems from pQ. 
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6 Discussion 



As we said before, we remain sanguine about the prospects for proving an oracle separation between 
BQP and PH. In our view, the lesson of our counterexample is simply that almost fc-wise inde- 
pendence is too blunt of an instrument for this problem. Looking at the specific function /s U rj in 
the counterexample, we find two arguments in support of this position. Firstly, /s U rj is extremely 
different in character from Fourier Checking, or any of the other candidates for problems in 
BQP \ PH (such as the ones studied by Fefferman and Umans Dp)- Indeed, /surj is not even in 
BQP, as can be seen from the BBBV lower bound [4] for example^] Secondly, /surj is trivially in PH 
by construction — and for that reason, our counterexample does not really say anything unexpected 
about "the power of PH." To us, the unexpected part is simply the inability of approximate local 
statistics to "certify" a problem as outside PH, where exact local statistics succeed in doing so (as 
shown by Braverman [7J). But this is a surprise about proof techniques, not about complexity 
classes. 

The obvious open problems are 

(1) to solve the relativized BQP versus PH problem by whatever means, and 

(2) to solve the relativized BQP versus AM problem, possibly by proving the depth-two GLN 
Conjecture. 

We reiterate our offer of a $200 prize for problem (1) and a $100 prize for problem (2). 

A third interesting problem is to show that our function /s U rj (X) cannot be computed in Z^, 
on a 1 — e fraction of inputs X £ [M]^. This would imply that (n^)" 4 <f. (^f)" 4 with probability 1 
relative to a random oracle A. A fourth problem is whether one can say anything nontrivial about 
the block-sensitivity of AC functions: for example, that every / 6 AC has average block-sensitivity 

(n/ logn). 
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